Integrated simultaneous analysis of different biomedical data types with exact weighted bi-cluster editing

نویسندگان

  • Peng Sun
  • Jiong Guo
  • Jan Baumbach
چکیده

The explosion of biological data has largely influenced the focus of today's biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BiCluE - Exact and heuristic algorithms for weighted bi-cluster editing of biomedical data

BACKGROUND The explosion of biological data has dramatically reformed today's biology research. The biggest challenge to biologists and bioinformaticians is the integration and analysis of large quantity of data to provide meaningful insights. One major problem is the combined analysis of data from different types. Bi-cluster editing, as a special case of clustering, which partitions two differ...

متن کامل

Efficient Large-scale bicluster editing

The explosion of the biological data has dramatically reformed today’s biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as simultaneous clustering or co-clustering, has been successfully utilized to discover local patterns in gene expression data and si...

متن کامل

Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering

The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data an...

متن کامل

Bi-level clustering of mixed categorical and numerical biomedical data

Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for 'Bi-Level Clustering of Mixed categorical and numerical data types'. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedi...

متن کامل

An algorithm for integrated worker assignment, mixed-model two-sided assembly line balancing and bottleneck analysis

This paper addresses a multi-objective mixed-model two-sided assembly line balancing and worker assignment with bottleneck analysis when the task times are dependent on the worker’s skill. This problem is known as NP-hard class, thus, a hybrid cyclic-hierarchical algorithm is presented for solving it. The algorithm is based on Particle Swarm Optimization (PSO) and Theory of Constraints (TOC) an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of integrative bioinformatics

دوره 9 2  شماره 

صفحات  -

تاریخ انتشار 2012